{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center>RegEx in Python</center>\n",
    "\n",
    "![](images/memes/meme21.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compilation Flags\n",
    "\n",
    "- When compiling a pattern string into a pattern object, it's possible to **modify the standard behavior of the patterns** using **Compilation Flags**.\n",
    "\n",
    "- Multiple compilation flags can be combined using the bitwise OR \"|\".\n",
    "\n",
    "Here is a list of all the complation flags:\n",
    "\n",
    "<table style=\"border: 1px solid black; font-size:15px;\">\n",
    "<thead>\n",
    "    <th>Syntax</th>\n",
    "    <th>Meaning</th>\n",
    "</thead>\n",
    "    \n",
    "<tbody>\n",
    "<tr>\n",
    "    <td>re.IGNORECASE or re.I</td>\n",
    "    <td>ignore case.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.MULTILINE or re.M</td>\n",
    "    <td>make begin/end boundary matchers (^, $) consider each line.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.DOTALL or re.S</td>\n",
    "    <td>make . match newline too.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.UNICODE or re.U</td>\n",
    "    <td>make {\\w, \\W, \\b, \\B} follow Unicode rules.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.LOCALE or re.L</td>\n",
    "    <td>make {\\w, \\W, \\b, \\B} follow locale.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.ASCII or re.A</td>\n",
    "    <td>make {\\w, \\W, \\b, \\B} perform ASCII-only matching.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.VERBOSE or re.X</td>\n",
    "    <td>allow comment in regex.</td>\n",
    "</tr>\n",
    "\n",
    "<tr>\n",
    "    <td>re.DEBUG</td>\n",
    "    <td>get information about the compilation pattern.</td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>\n",
    "\n",
    "Let's go through each one of them one by one.\n",
    "\n",
    "## 1. re.IGNORECASE or re.I\n",
    "\n",
    "This flag makes a regex pattern case-insensitive.\n",
    "\n",
    "\n",
    "Let's check out an example to find all occurances of `the` and `The` in the given text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "from utils import highlight_regex_matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "The best thing about regex is that it makes the task of string manipulation so easy.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"the\", flags=re.I)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "re.compile(r'the', re.IGNORECASE|re.UNICODE)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[43m\u001b[1mThe\u001b[0m best thing about regex is that it makes \u001b[43m\u001b[1mthe\u001b[0m task of string manipulation so easy.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. re.MULTILINE or re.M\n",
    "\n",
    "This flag is used to make begin/end boundary matchers (`^`, `$`) consider each line of the given text.\n",
    "\n",
    "\n",
    "Let's check out an example to find all lines starting with `A`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "A man was crossing the road.\n",
    "Suddenly, a car passed before him in a very high speed.\n",
    "He was terrified\n",
    "And shocked.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"^A.+\", flags=re.M)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[43m\u001b[1mA man was crossing the road.\u001b[0m\n",
      "Suddenly, a car passed before him in a very high speed.\n",
      "He was terrified\n",
      "\u001b[43m\u001b[1mAnd shocked.\u001b[0m\n",
      "\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. re.DOTALL or re.S\n",
    "\n",
    "The `.` metacharacter matches everything except newline character. If we want to make `.` match newline too, we have to set this flag.\n",
    "\n",
    "Let's consider an examle to match all the text after (and including) `car`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"car.+\", flags=re.S)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "A man was crossing the road.\n",
      "Suddenly, a \u001b[43m\u001b[1mcar passed before him in a very high speed.\n",
      "He was terrified\n",
      "And shocked.\n",
      "\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. re.UNICODE or re.U\n",
    "\n",
    "Using this flag, we can make the pattern characters `{\\w, \\W, \\b, \\B}` dependent on the Unicode character properties database.\n",
    "\n",
    "> re.UNICODE is the default flag in Python 3 regex patterns.\n",
    "\n",
    "Let's consider an example where we try to work on hindi language."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"मुझे किताबें पढ़ना बहुत पसंद है।\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\w+\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['म', 'झ', 'क', 'त', 'ब', 'पढ', 'न', 'बह', 'त', 'पस', 'द', 'ह']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Solution](https://stackoverflow.com/questions/12746458/python-unicode-regular-expression-matching-failing-with-some-unicode-characters/12747529#12747529)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "import regex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = regex.compile(\"\\w+\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['मुझे', 'किताबें', 'पढ़ना', 'बहुत', 'पसंद', 'है']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. re.LOCALE or re.L\n",
    "\n",
    "> A locale is a set of environmental variables that defines the language, country, and character encoding settings (or any other special variant preferences) for your applications.\n",
    "\n",
    "This flag will make the word pattern `{\\w, \\W}` and boundary pattern `{\\b, \\B}`, dependent on the current locale. \n",
    "\n",
    "<span style=\"color:red;\">**The use of this flag is discouraged in Python 3 as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages.**</span>\n",
    "\n",
    "\n",
    "## 6. re.ASCII or re.A\n",
    "\n",
    "This flag will make the word pattern `{\\w, \\W}` and boundary pattern `{\\b, \\B}` perform ASCII-only matching, i.e. only A-Z, a-z, 0-9 will be considered alphanumeric characters. \n",
    "\n",
    "Let us see an example below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "chars =  ''.join(chr(i) for i in range(256))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n",
      "\u000b",
      "\f",
      "\r",
      "\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c",
      "\u001d",
      "\u001e",
      "\u001f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
      " ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ\n"
     ]
    }
   ],
   "source": [
    "print(chars)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\w\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n",
      "\u000b",
      "\f",
      "\r",
      "\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c",
      "\u001d",
      "\u001e",
      "\u001f !\"#$%&'()*+,-./\u001b[43m\u001b[1m0\u001b[0m\u001b[43m\u001b[1m1\u001b[0m\u001b[43m\u001b[1m2\u001b[0m\u001b[43m\u001b[1m3\u001b[0m\u001b[43m\u001b[1m4\u001b[0m\u001b[43m\u001b[1m5\u001b[0m\u001b[43m\u001b[1m6\u001b[0m\u001b[43m\u001b[1m7\u001b[0m\u001b[43m\u001b[1m8\u001b[0m\u001b[43m\u001b[1m9\u001b[0m:;<=>?@\u001b[43m\u001b[1mA\u001b[0m\u001b[43m\u001b[1mB\u001b[0m\u001b[43m\u001b[1mC\u001b[0m\u001b[43m\u001b[1mD\u001b[0m\u001b[43m\u001b[1mE\u001b[0m\u001b[43m\u001b[1mF\u001b[0m\u001b[43m\u001b[1mG\u001b[0m\u001b[43m\u001b[1mH\u001b[0m\u001b[43m\u001b[1mI\u001b[0m\u001b[43m\u001b[1mJ\u001b[0m\u001b[43m\u001b[1mK\u001b[0m\u001b[43m\u001b[1mL\u001b[0m\u001b[43m\u001b[1mM\u001b[0m\u001b[43m\u001b[1mN\u001b[0m\u001b[43m\u001b[1mO\u001b[0m\u001b[43m\u001b[1mP\u001b[0m\u001b[43m\u001b[1mQ\u001b[0m\u001b[43m\u001b[1mR\u001b[0m\u001b[43m\u001b[1mS\u001b[0m\u001b[43m\u001b[1mT\u001b[0m\u001b[43m\u001b[1mU\u001b[0m\u001b[43m\u001b[1mV\u001b[0m\u001b[43m\u001b[1mW\u001b[0m\u001b[43m\u001b[1mX\u001b[0m\u001b[43m\u001b[1mY\u001b[0m\u001b[43m\u001b[1mZ\u001b[0m[\\]^\u001b[43m\u001b[1m_\u001b[0m`\u001b[43m\u001b[1ma\u001b[0m\u001b[43m\u001b[1mb\u001b[0m\u001b[43m\u001b[1mc\u001b[0m\u001b[43m\u001b[1md\u001b[0m\u001b[43m\u001b[1me\u001b[0m\u001b[43m\u001b[1mf\u001b[0m\u001b[43m\u001b[1mg\u001b[0m\u001b[43m\u001b[1mh\u001b[0m\u001b[43m\u001b[1mi\u001b[0m\u001b[43m\u001b[1mj\u001b[0m\u001b[43m\u001b[1mk\u001b[0m\u001b[43m\u001b[1ml\u001b[0m\u001b[43m\u001b[1mm\u001b[0m\u001b[43m\u001b[1mn\u001b[0m\u001b[43m\u001b[1mo\u001b[0m\u001b[43m\u001b[1mp\u001b[0m\u001b[43m\u001b[1mq\u001b[0m\u001b[43m\u001b[1mr\u001b[0m\u001b[43m\u001b[1ms\u001b[0m\u001b[43m\u001b[1mt\u001b[0m\u001b[43m\u001b[1mu\u001b[0m\u001b[43m\u001b[1mv\u001b[0m\u001b[43m\u001b[1mw\u001b[0m\u001b[43m\u001b[1mx\u001b[0m\u001b[43m\u001b[1my\u001b[0m\u001b[43m\u001b[1mz\u001b[0m{|}~",
      " ¡¢£¤¥¦§¨©\u001b[43m\u001b[1mª\u001b[0m«¬­®¯°±\u001b[43m\u001b[1m²\u001b[0m\u001b[43m\u001b[1m³\u001b[0m´\u001b[43m\u001b[1mµ\u001b[0m¶·¸\u001b[43m\u001b[1m¹\u001b[0m\u001b[43m\u001b[1mº\u001b[0m»\u001b[43m\u001b[1m¼\u001b[0m\u001b[43m\u001b[1m½\u001b[0m\u001b[43m\u001b[1m¾\u001b[0m¿\u001b[43m\u001b[1mÀ\u001b[0m\u001b[43m\u001b[1mÁ\u001b[0m\u001b[43m\u001b[1mÂ\u001b[0m\u001b[43m\u001b[1mÃ\u001b[0m\u001b[43m\u001b[1mÄ\u001b[0m\u001b[43m\u001b[1mÅ\u001b[0m\u001b[43m\u001b[1mÆ\u001b[0m\u001b[43m\u001b[1mÇ\u001b[0m\u001b[43m\u001b[1mÈ\u001b[0m\u001b[43m\u001b[1mÉ\u001b[0m\u001b[43m\u001b[1mÊ\u001b[0m\u001b[43m\u001b[1mË\u001b[0m\u001b[43m\u001b[1mÌ\u001b[0m\u001b[43m\u001b[1mÍ\u001b[0m\u001b[43m\u001b[1mÎ\u001b[0m\u001b[43m\u001b[1mÏ\u001b[0m\u001b[43m\u001b[1mÐ\u001b[0m\u001b[43m\u001b[1mÑ\u001b[0m\u001b[43m\u001b[1mÒ\u001b[0m\u001b[43m\u001b[1mÓ\u001b[0m\u001b[43m\u001b[1mÔ\u001b[0m\u001b[43m\u001b[1mÕ\u001b[0m\u001b[43m\u001b[1mÖ\u001b[0m×\u001b[43m\u001b[1mØ\u001b[0m\u001b[43m\u001b[1mÙ\u001b[0m\u001b[43m\u001b[1mÚ\u001b[0m\u001b[43m\u001b[1mÛ\u001b[0m\u001b[43m\u001b[1mÜ\u001b[0m\u001b[43m\u001b[1mÝ\u001b[0m\u001b[43m\u001b[1mÞ\u001b[0m\u001b[43m\u001b[1mß\u001b[0m\u001b[43m\u001b[1mà\u001b[0m\u001b[43m\u001b[1má\u001b[0m\u001b[43m\u001b[1mâ\u001b[0m\u001b[43m\u001b[1mã\u001b[0m\u001b[43m\u001b[1mä\u001b[0m\u001b[43m\u001b[1må\u001b[0m\u001b[43m\u001b[1mæ\u001b[0m\u001b[43m\u001b[1mç\u001b[0m\u001b[43m\u001b[1mè\u001b[0m\u001b[43m\u001b[1mé\u001b[0m\u001b[43m\u001b[1mê\u001b[0m\u001b[43m\u001b[1më\u001b[0m\u001b[43m\u001b[1mì\u001b[0m\u001b[43m\u001b[1mí\u001b[0m\u001b[43m\u001b[1mî\u001b[0m\u001b[43m\u001b[1mï\u001b[0m\u001b[43m\u001b[1mð\u001b[0m\u001b[43m\u001b[1mñ\u001b[0m\u001b[43m\u001b[1mò\u001b[0m\u001b[43m\u001b[1mó\u001b[0m\u001b[43m\u001b[1mô\u001b[0m\u001b[43m\u001b[1mõ\u001b[0m\u001b[43m\u001b[1mö\u001b[0m÷\u001b[43m\u001b[1mø\u001b[0m\u001b[43m\u001b[1mù\u001b[0m\u001b[43m\u001b[1mú\u001b[0m\u001b[43m\u001b[1mû\u001b[0m\u001b[43m\u001b[1mü\u001b[0m\u001b[43m\u001b[1mý\u001b[0m\u001b[43m\u001b[1mþ\u001b[0m\u001b[43m\u001b[1mÿ\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, chars)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\w\", flags=re.A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n",
      "\u000b",
      "\f",
      "\r",
      "\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c",
      "\u001d",
      "\u001e",
      "\u001f !\"#$%&'()*+,-./\u001b[43m\u001b[1m0\u001b[0m\u001b[43m\u001b[1m1\u001b[0m\u001b[43m\u001b[1m2\u001b[0m\u001b[43m\u001b[1m3\u001b[0m\u001b[43m\u001b[1m4\u001b[0m\u001b[43m\u001b[1m5\u001b[0m\u001b[43m\u001b[1m6\u001b[0m\u001b[43m\u001b[1m7\u001b[0m\u001b[43m\u001b[1m8\u001b[0m\u001b[43m\u001b[1m9\u001b[0m:;<=>?@\u001b[43m\u001b[1mA\u001b[0m\u001b[43m\u001b[1mB\u001b[0m\u001b[43m\u001b[1mC\u001b[0m\u001b[43m\u001b[1mD\u001b[0m\u001b[43m\u001b[1mE\u001b[0m\u001b[43m\u001b[1mF\u001b[0m\u001b[43m\u001b[1mG\u001b[0m\u001b[43m\u001b[1mH\u001b[0m\u001b[43m\u001b[1mI\u001b[0m\u001b[43m\u001b[1mJ\u001b[0m\u001b[43m\u001b[1mK\u001b[0m\u001b[43m\u001b[1mL\u001b[0m\u001b[43m\u001b[1mM\u001b[0m\u001b[43m\u001b[1mN\u001b[0m\u001b[43m\u001b[1mO\u001b[0m\u001b[43m\u001b[1mP\u001b[0m\u001b[43m\u001b[1mQ\u001b[0m\u001b[43m\u001b[1mR\u001b[0m\u001b[43m\u001b[1mS\u001b[0m\u001b[43m\u001b[1mT\u001b[0m\u001b[43m\u001b[1mU\u001b[0m\u001b[43m\u001b[1mV\u001b[0m\u001b[43m\u001b[1mW\u001b[0m\u001b[43m\u001b[1mX\u001b[0m\u001b[43m\u001b[1mY\u001b[0m\u001b[43m\u001b[1mZ\u001b[0m[\\]^\u001b[43m\u001b[1m_\u001b[0m`\u001b[43m\u001b[1ma\u001b[0m\u001b[43m\u001b[1mb\u001b[0m\u001b[43m\u001b[1mc\u001b[0m\u001b[43m\u001b[1md\u001b[0m\u001b[43m\u001b[1me\u001b[0m\u001b[43m\u001b[1mf\u001b[0m\u001b[43m\u001b[1mg\u001b[0m\u001b[43m\u001b[1mh\u001b[0m\u001b[43m\u001b[1mi\u001b[0m\u001b[43m\u001b[1mj\u001b[0m\u001b[43m\u001b[1mk\u001b[0m\u001b[43m\u001b[1ml\u001b[0m\u001b[43m\u001b[1mm\u001b[0m\u001b[43m\u001b[1mn\u001b[0m\u001b[43m\u001b[1mo\u001b[0m\u001b[43m\u001b[1mp\u001b[0m\u001b[43m\u001b[1mq\u001b[0m\u001b[43m\u001b[1mr\u001b[0m\u001b[43m\u001b[1ms\u001b[0m\u001b[43m\u001b[1mt\u001b[0m\u001b[43m\u001b[1mu\u001b[0m\u001b[43m\u001b[1mv\u001b[0m\u001b[43m\u001b[1mw\u001b[0m\u001b[43m\u001b[1mx\u001b[0m\u001b[43m\u001b[1my\u001b[0m\u001b[43m\u001b[1mz\u001b[0m{|}~",
      " ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ\n"
     ]
    }
   ],
   "source": [
    "highlight_regex_matches(pattern, chars)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. re.VERBOSE or re.X\n",
    "\n",
    "This flag changes the regex syntax, to allow you to add annotations in regex. \n",
    "\n",
    "- Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash.\n",
    "\n",
    "- When a line contains a # neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "This is a sample text123\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\w +\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['s ', 's ', 'a ', 'e ']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"\\w +  # find all words\", flags=re.X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['This', 'is', 'a', 'sample', 'text123']"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. re.DEBUG\n",
    "\n",
    "This flag when set, gives some information about the compilation pattern."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LITERAL 8\n",
      "MAX_REPEAT 1 MAXREPEAT\n",
      "  IN\n",
      "    RANGE (97, 101)\n",
      "    RANGE (55, 57)\n",
      "LITERAL 8\n"
     ]
    }
   ],
   "source": [
    "pattern = re.compile(\"\\b[a-e7-9]+\\b\", flags=re.DEBUG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/memes/meme22.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
